39 research outputs found
Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
Knowledge distillation (KD) is an effective framework that aims to transfer
meaningful information from a large teacher to a smaller student. Generally, KD
often involves how to define and transfer knowledge. Previous KD methods often
focus on mining various forms of knowledge, for example, feature maps and
refined information. However, the knowledge is derived from the primary
supervised task and thus is highly task-specific. Motivated by the recent
success of self-supervised representation learning, we propose an auxiliary
self-supervision augmented task to guide networks to learn more meaningful
features. Therefore, we can derive soft self-supervision augmented
distributions as richer dark knowledge from this task for KD. Unlike previous
knowledge, this distribution encodes joint knowledge from supervised and
self-supervised feature learning. Beyond knowledge exploration, we propose to
append several auxiliary branches at various hidden layers, to fully take
advantage of hierarchical feature maps. Each auxiliary branch is guided to
learn self-supervision augmented task and distill this distribution from
teacher to student. Overall, we call our KD method as Hierarchical
Self-Supervision Augmented Knowledge Distillation (HSSAKD). Experiments on
standard image classification show that both offline and online HSSAKD achieves
state-of-the-art performance in the field of KD. Further transfer experiments
on object detection further verify that HSSAKD can guide the network to learn
better features. The code is available at https://github.com/winycg/HSAKD.Comment: 15 pages, Accepted by IEEE Transactions on Neural Networks and
Learning Systems 202
E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network
Continual Learning methods are designed to learn new tasks without erasing
previous knowledge. However, Continual Learning often requires massive
computational power and storage capacity for satisfactory performance. In this
paper, we propose a resource-efficient continual learning method called the
Elastic Expansion Network (E2Net). Leveraging core subnet distillation and
precise replay sample selection, E2Net achieves superior average accuracy and
diminished forgetting within the same computational and storage constraints,
all while minimizing processing time. In E2Net, we propose Representative
Network Distillation to identify the representative core subnet by assessing
parameter quantity and output similarity with the working network, distilling
analogous subnets within the working network to mitigate reliance on rehearsal
buffers and facilitating knowledge transfer across previous tasks. To enhance
storage resource utilization, we then propose Subnet Constraint Experience
Replay to optimize rehearsal efficiency through a sample storage strategy based
on the structures of representative networks. Extensive experiments conducted
predominantly on cloud environments with diverse datasets and also spanning the
edge environment demonstrate that E2Net consistently outperforms
state-of-the-art methods. In addition, our method outperforms competitors in
terms of both storage and computational requirements
Online Knowledge Distillation via Mutual Contrastive Learning for Visual Recognition
The teacher-free online Knowledge Distillation (KD) aims to train an ensemble
of multiple student models collaboratively and distill knowledge from each
other. Although existing online KD methods achieve desirable performance, they
often focus on class probabilities as the core knowledge type, ignoring the
valuable feature representational information. We present a Mutual Contrastive
Learning (MCL) framework for online KD. The core idea of MCL is to perform
mutual interaction and transfer of contrastive distributions among a cohort of
networks in an online manner. Our MCL can aggregate cross-network embedding
information and maximize the lower bound to the mutual information between two
networks. This enables each network to learn extra contrastive knowledge from
others, leading to better feature representations, thus improving the
performance of visual recognition tasks. Beyond the final layer, we extend MCL
to intermediate layers and perform an adaptive layer-matching mechanism trained
by meta-optimization. Experiments on image classification and transfer learning
to visual recognition tasks show that layer-wise MCL can lead to consistent
performance gains against state-of-the-art online KD approaches. The
superiority demonstrates that layer-wise MCL can guide the network to generate
better feature representations. Our code is publicly avaliable at
https://github.com/winycg/L-MCL.Comment: 18 pages, accepted by IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI-2023